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jrt ' Abstract. The problems of random projections and sparse reconstruction 

have much in common and individually received much attention. Surprisingly, 
until now they progressed in parallel and remained mostly separate. Here, 
we employ new tools from probability in Banach spaces that were successfully 
used in the context of sparse reconstruction to advance on an open problem in 
random pojection. In particular, wc generalize and use an intricate result by 
Rudclson and Vcrshynin for sparse reconstruction which uses Dudley's theo- 
^^ ■ rem for bounding Gaussian processes. Our main result states that any set of 

Q' A'^ = exp(0(n)) real vectors in n dimensional space can be linearly mapped to 

a space of dimension k = 0(log Af polylog(n)), while (1) preserving the pair- 
ed , wise distances among the vectors to within any constant distortion and (2) 
O 1 being able to apply the transformation in time 0(n log n) on each vector. This 

improves on the best known A^ = exp(0(ri^''^)) achieved by Ailon and Liberty 
and N = exp(0(?i^'^)) by Ailon and Chazelle. The dependence in the dis- 
tortion constant however is believed to be suboptimal and subject to further 
investigation. For constant distortion, this settles the open question posed by 
these authors up to a poly log (n) factor while considerably simplifying their 
constructions. 
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• ■ 1. Introduction 

^^ I Designing computationally efficient transformations that reduce dimensionality 

^^ ■ of data while approximately preserving its metric information lies at the heart of 

many problems. While in compressed sensing such techniques are sought for sparse 
data in a real or complex metric space (with respect to some basis), in random 
projections, following the seminal work of Johnson and Lindenstrauss, one seeks to 
reduce dimension of any set of finite datajj In both applications, random matrices of 
C^ ' a suitable size [1] [2] [3] [4] result in optimal construction [5] in the parameters n (the 

original dimension), k (the target dimension), N (the number of input vectors) and 
6 (the distortion). However, these constructions' resulting running time complexity, 
measured as number of operations needed in order to map a vector, is suboptimal. 
A major open question is that of designing such matrix distributions that can be 
applied efficiently to any vector, with optimal dependence in the parameters n,k, N 
and S. Applications for such transformations were found e.g. in designing fast ap- 
proximation algorithms for solving large scale linear algebraic operations (e.g. [6], 
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and became synonymous with the process of approximate metric preserving dimension reduction 
using randomized linear mappings. However, these linear mappings need not be (and indeed are 
usually not) projections in the linear algebraic sense of the word. 

1 



NIR AILON AND EDO LIBERTY 



[7]) The two lines of work, though sharing much in common, have mostly progressed 
in parallel. Here we combine recent work on bounds for sparse reconstruction to 
improve bounds of Ailon and Chazelle jH', '^ and Ailon and Liberty Liberty [TU] 
on fast random projections, also known as Fast Johnson-Lindenstrauss transforma- 
tions. The new bounds allow obtaining the well known Fast Johnson-Lindenstrauss 
Transform for finite sets of bounded cardinality N = exp(0(n)) where n is the 
original dimension. The best known so far was obtained by Ailon and Liberty 
for sets of size up to A^ = exp{0(n^/^)}o The latter improved on Ailon and 
Chazelle's original bound of A^ = exp{0(ri^/^)}, which initiated the construction of 
Fast Johnson-Lindenstrauss Transforms. We also mention Dasgupta et al.'s work 
[llj on construction of Johnson-Linenstrauss random matrices which can be more 
efficiently applied to sparse vectors, with applications in the streaming model, and 
Ailon et al's work JL2^ on design of Johnson-Lindenstrauss matrices that run in 
linear time under certain assumptions on various norms of the input vectors. 

The transformation we derive here is a composition of two random matrices: A 
random sign matrix and a random selection of a suitable number k of rows from 
a Fourier matrix, where k — 0((5^'*(log A^) polylog(n)), and 6 is the tolerated dis- 
tortion level. The result, for constant 5, is believed to be suboptimal within the 
polylog(n) factor in the target dimension k. The running time of performing the 
transformation on a vector is dominated by the 0{n log n) of the Fast Fourier Trans- 
form, and is believed to be optimal. The possibility of obtaining such a running 
time for fixed distortion was left as an open problem in Ailon and Chazelle and 
Ailon and Liberty's work, and here we resolve it up to a factor of polylog(n). The 
dependence on the constant 6 is also believed to be suboptimal, and the "correct" 
dependence shoould be 6^^. The question of improving this dependence is left as 
an open problem. 

The use of a combination of random sign matrices and various forms of subsam- 
pled Fourier matrices was also used in the work of Ailon and Chazelle [8^ and later 
Ailon and Liberty (TU] , as well as that of Matousek [TB] . Here we obtain improved 
analysis using recent work by Rudelson and Vershynin for sparse reconstruction 

M- 

1.1. Restricted Isometry. An underlying idea common to both random projec- 
tions and sparse reconstruction is the preservation of metric information under a 
dimension reducing transformation. In sparse reconstruction theory, this property 
is known as restricted isometry |15j|16j. A matrix <i> is a restricted isometry with 
sparseness paramater r if for some ^ > 0, 

(1.1) V r-sparse zj e M" (1 - <5)|l2/|l^ < |l<J>y|l2 < (1 + J)|ly||2 . 

By r-sparse y we mean vectors in R" with all but at most r coordinates zero. It 
was shown in [TS] that the restricted isometry property is sufficient for the purpose 
of perfect reconstruction of sparse vectors, compressed sensing being one of the 
prominent applications. 

In jl7] . Rudelson and Vershynin construct a distribution over k x n matri- 
ces $ such that, with high probability, $ has the restricted isometry property 
with sparseness parameter r and arbitrarily small 6 > 0|f| In their analysis. 



The notation O(-) suppresses arbitrarily small polynomial coefficients and polylogaritlimic 
factors. 

Their analysis is done over the complex field, but we restrict the discussion to the reals here. 



ALMOST OPTIMAL UNRESTRICTED FAST JOHNSON-LINDENSTRAUSS TRANSFORM 3 

k = 0((5^^r log(7i) • log (r) log(r logn)) and $ can be applied (to a given vector 
x) in running time O(nlogri). Assuming r polynomial in n, this takes the simpler 
form of fc = 0((5~^r log'' n)o In fact, $ is (up to a constant) nothing other than 
a random choice of k rows from the (unnormalized) Hadamard matrix, defined as 
^w,t = (~1)^"'*\ where (•, •) is the dot product over the binary field, n is assumed 
to be a power of 2 and uj,t are thought of as logn dimensional vectors over the 
binary field in an obvious way|j As a corollary of the result, one obtains a universal 
matrix for reconstructing sparse signals, which can be applied to a vector in time 
0{n\ogn). The conjecture is that the same distribution with k = 0{6~^r\ogn) 
should work as well, but this is a major open question beyond the scope of this 
work. For an excellent survey explaining how restricted isometry can be used for 
sparse reconstruction, and why designing such matrices with good computational 
properties is important we refer the readers to 118 ; and to references therein. 

Independently, Ailon and Chazelle [8| and Ailon and Liberty pj] were interested 
in constructing a distribution of fc x n matrices $ such that for any set Y C R" of 
cardinality N, one gets 

(1-2) V yer (l-<5)||y||^<||'i>2;||<(l + <5)||y||^, 

with constant probability. Additionally, the number of steps required for applying 
$ on any given x is 0{n\ogn). In their result k was taken as 0{6^^ logiV), which 
is also essentially the best possible [5j. Unfortunately, both results break down 
when k = r2(n^'^)|j Assuming the tolerance parameter S fixed, this limitation can 
be rephrased as follows: The techniques fail when the number of vectors N is in 
exp{f}(ni/2)}. 

In both Ailon and Chazelle [8] and Ailon and Liberty's pO] results, as well as 
in previous work [1] [2] [2] [E] [4] the bounds (|1.2|) are obtained by proving strong 
tail bounds on the distribution of the estimator ||$y||2, and then applying a simple 
union bound on the finite collection Y . It is worth a moment's thought to realize 
that Ailon and Chazelle's result as well as that of Ailon and Liberty can be used 
for restricted isometry as well. Indeed, a simple epsilon-net argument for the set 
of r-sparse vectors can turn that set into a finite set of exp{0(r' logn)} vectors, on 
which a union bound can be applied. However, the current limitation of random 
projections mentioned above will limit r to be in n'^(^/'^~f^> (for arbitrarily small /i). 
Interestingly, Rudelson and Vershynin's result does not break down for r polynomial 
in n. A careful inspection of their techniques reveals that instead of union bounding 
on a finite set of strongly concentrated random variables, they use a result due to 
Dudley to bound extreme values of Gaussian processes. Can this idea be used to 
improve [5] and |10] ? Intuitively there is no reason why a result which is designed 
for preserving the metric of sparse vectors should help with preserving the metric 
of any finite set of vectors. It turns out, luckily, that such a reduction can be 



In their work, the dependence of fc on 5 is not analyzed because S is assumed to be fixed (for 
sparse signal reconstruction purposes, this dependence is not important). It is not hard to derive 
the quadratic dependence oi k in S~^ from their work. 

^Rudelson and Vershynin use the complex Discrete Fourier Transform matrix, but their analysis 
does not change when using the Hadamard matrix. 

Ailon and Chazelle \E\ and Ailon and Liberty |10| used d to denote the data dimension, 
n its cardinality and e the sought distortion bound. Here we follow Rudelson and Vershynin's 
convention using n to denote the dimension and S the distortion bound. We now use A'^ to denote 
the data cardinality. 
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done, though not in an immediate way. A suitable generahzation of Rudelson and 
Vershynin's result (Section [2]), combined with Ailon and Chazelle ^ and Ailon 
and Liberty's |10) method of random sign matrix preconditioning achieves this in 
Section [3l 

1.2. Notation. In what follows, we fix N to denote the cardinality of a set Y of 
vectors in M", where n is fixed. We also fix a distortion parameter d € (0, 1/2], and 
define k to be an integer in Q{S^^{logN){log n)). 

Now let $ be a random k x n matrix obtained by picking k random rows from 
the unnormalized n x n Hadamard matrix (the Euclidean norm of each column of 
$ is \/k). Let fl denote the probability space for the choice of $. 

Let b denote a uniformly chosen vector in {—1, 1}", and let F denote the proba- 
bility space on the choice of b. For a vector y € R" , we denote by Dy the diagonal 
n X n matrix with the coordinates of y on the diagonal. For a real matrix, || • | 
denotes its spectral norm and (•)* its transpose. For a set T C {l,...n}, we let 
Mt denote the diagonal matrix with Idxiiji) = 1 if i G T, and otherwise. For 
a vector y e M", let supp(j/) denote the support of y, namely, its set of nonzero 
coordinates. For a number p > 1, let Bp C R" denote the set of vectors y e M" 
with ||y||p < 1 and aBp as the set of vectors y S M" for which \\y\\p < a. 

2. Restricted isometry result generalization 

We follow the main path of Rudelson et al. in [17] to prove a more general 
formulation of their main theorem which is more suitable for us here. 



Theorem 2.1. [Derived from Rudelson and Vershyninl7\] Let a > be any real 
number. Define E^ as 



(2.1) 



Ea — Ef, 



sup 



Dl 



\Dy^'^Dy 



Then for some global Ci > 0, 

Cil0g3/2(n)l0gl/2(fc) 



(2.2) 



Ea < 



Vk 



{Ec, + a') 



2U/2 



In particular, if -^-^ r^ — 0(a), then 

' a{\oi''^ n){\og^'^ k) 



(2.3) 




Vfc 



The proof we present is an adaptation of the proof of Theorem 3.4 in [T7] to a 
more general setting. In fact, the latter theorem 17 can be obtained as an easy 
consequence of theorem 1 2.11 bv replacing sup^g^^p,^^^ in (|2.1I) by sup„gj_Y-^ where 
Yr C R" is defined as the set of vectors with at most r coordinates equalling 1 and 
the remaining coordinates zero. Indeed, -y^Yr C i?2 n r^^/^Bao- We can therefore 

conclude that for a = ^, by definition. 



En 



sup 



n2 n <j)*(j)n 



<^a 
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If we also assume that k 
conclude that 



9(rlog n), then ()2.3|) will hold, from which we 



(2.4) En 



sup 



n2 n <j>*<T>n 



^3/2 



<o 



(log"/'n)(log'/'fc) 



1/2, 



Now we notice that Dy — -y= Idsupp y , where for a set of indexes T the diagonal 
matrix Idy (as defined in jl7) ) has 1 in diagonal position i if and only if i G T. 
Using this observation and multiplying (|2.4p by r we conclude that 



E. 



sup 

\T\<r 



Mt-yMt^^^Mt 



<0 



Vk 



which is exactly the main result of Rudelson and Vershynin in ^17^ for restricted 
isometry. 

The proof of Theorem l2. II below points out the necessary changes to the proof of 
Theorem 3.4 in [T^. The difference between the theorems is that in our case, the 
supremum in the definition of Ea is taken not only over the set of sparse vectors, 
but over a richer set. It turns out however that |17| uses sparsity in a very limited 
way: In fact, the dominating effect of sparsity there is obtained using the fact 
that the Li norm of a sparse vector is small, compared to its L2 norm. These 
arguments appear at the very end of their proof. For the sake of contributing to 
the self containment of the paper we walk through the main milestones of the proof 
of Theorem 3.4 in [T7|, and point out the changes necessary for our purposes. The 
reader is nevertheless encouraged to refer to the enlightening exposition in [17] first. 

Proof. Clearly E[^Dy^'^^Dy] — Dy. We define new independent random i.i.d. 
variables ei, . . . , e„ obtaining each the values +1,-1 with equal probability. Let 11 
denote the probability space for ei, . . . , e„. It suffices to prove (using a symmetriza- 
tion argument, see Lemma 6.3 in |19) ) that 

(2.5) 



En> 



sup 

y£B2naB^ 



1 '" 

k^ 

1=1 



ei{xiDyY{xiDy) 



< 



2Ci(log 



3/2, 



Vk 



■«■°'^""(i;„+aY/^ 



where Xi is the (random) i'th row of $fc. To that end, as claimed in [T7] (Lemma 

3.5), if we can show that for any fixed choice of $, 

(2.6) 



En 



sup 



^e,{xiDyY{xiDy) 



4=1 



< fci sup 

yeB2naBa 



k 

E 

1=1 



(x.DyYix^Dy) 



1/2 



for some number ki, then by taking En on both sides and using Jensen's inequality 
(to swap (•)^/^ on the RHS with En) and the triangle inequality, the conclusion 
would be that 

(2.7) Ea,<^ {E„ + WD't 



2fci , 



>l/2 



Since \\Dl\\ = 



2/11 L — '^> ^^ would get the stated result. It thus suffices to prove 
(|2.6p with fci = 0((log ' n)(log ' k)). To do so, [17] continue by replacing the 
k binary random variables ei,...,efc in (|2.6p with k Gaussian random variables 
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gi, . . . ,gk using a comparison principle (inequality (4.8) in [19]), reducing the prob- 
lem to that of bounding the expected extreme value of a Gaussian process. Using 
Dudley's inequality (Theorem 11.17 in |19)). as Rudelson and Vershynin do, one 
concludes that (12. 6p will hold with ki taken as: 



(2.8) / \og'/^N-{B,\\-\\x,u)du, 

Jo 

where: 

• For a norm || • H^,, a set S and number u, M{S, \\ ■ \\*,u) denotes the minimal 
number of balls of radius u in norm || • ||^, centered in points of S needed to 
cover the set S, 

• B is defined as Uy,=B2naB,^By, where By = {DyZ : z G -B2}, and 

• ||a;||x = rnaxi<fc |(a;i, a;)|, where we remind the reader that Xi is the i'ih row 
of $. 

Rudelson and Vershynin derive bounds on N{Brv^ II • l|x,w) for small u and 
for large u separately, where in their case Brv was the set of r-sparse vectors of 
Euclidean norm 1 (denoted by i^J" in [13)- The sparsity of the vectors in the set 
Brv is used in both derivations, as follows: 

• For large u, they use containment argument (11) in [IT], asserting that 
Brv Q \/rBi. Note that by Cauchy Schwartz and the definition of S, 
B C Bi hence we "gain" a factor of ^/r when deriving fci. 

• Forsmallu, inequality (13) in [T7] asserts that A/'(i?i?y, ||-||jf,u) < d{n,r){l+ 
2/uY, where d(n, r) is the number of ways to choose r elements from a set 
of n elements. Since the best sparseness we can assume for vectors in B 
here is trivially n, we replace the expression d(n,r) with d{n,n) = 1, and 
(1 + 2/u)'' with (l + 2/u)"Q 

Rudelson and Vershynin then derive a bound for J^ Af^^'^{BRv,\\ ■ \\x,u)du by 
balancing the two bounds at u = i/\/r. In our case we balance at m = l/y^. 
The net result will lead to a ki which is as the one in the statement of Lemma 3.5 
[T7] , except that the y/r will disappear and logr will be replaced by logn. The 
conclusion is that we can take fci to be 

fci = O ((logn)(v/I^)(logfc)) = O ((log3/2n)(logfc)^ 

as required. D 

3. Random Projections 

Our main result claims that the same construction used by Rudelson et al. also 
gives improved bounds for random projections. In what follows, we fix r to be 
[(5^^ log A^] and a to be l/v^. Additionally, we assume that $ is such that 

(3.1) Ea = 0{a^) . 

Indeed, Theorem 12.11 guarantees that this holds with probability at least 0.99 in ft. 



To be exact, in 1171 they use the expression {l + 2K/u)'~ and not (1 + 2/m)'', but the parameter 
K in their work can be taken as 1 for our purposes. 



ALMOST OPTIMAL UNRESTRICTED FAST JOHNSON-LINDENSTRAUSS TRANSFORM 



Theorem 3.1. Let Y <Z B2 denote a set of cardinality N , and let $ satisfy 13.1]) . 
With probability at least 0.98 (in T) we have the following uniform bound for all 
yeY: 

1 - 0{5) < -^^Dyb < 1 + 0{6) . 
vfc 

We provide some intuition for the proof. We split our input vectors Y into 
sums of two vectors, one of which is r-sparse and the other with f 00 norm bounded 
by Xjypr. We use Rudelson et al.'s original result for the sparse part and our 
generalization of it (Theorem l2.ip . together with Talagrand's measure concentration 
theorem for the ^00-bounded part. 

Proof. Let r and a be defined as in Section [2j For each y €Y we write y = y + y, 
where y is the restriction of y to its r largest (in absolute value) coordinates and y 



is the restriction to its remaining coordinates. Note that ||j/|p 
that y is r-sparse and that 
2 



\y\\ 



\y\\ 



and 



< a. 



Tk*"'" 



iij/ii 



Tt"-"'' 



-b^Dy'P^^Dyb. 



7^*^^^ 



y|| 



For the first term we have 
fact that y is r-sparse. 

In what follows we will use the bound on ||j/ 



0{S) from Theorem HU and the 



to show that with high probability, 
0{d). A similar argument will bound the 
cross product ^b*Dy^*^Dyb. Combining the three gives the desired result that 



for all y ^ Y, 



^$i,,6 



\yr 



^,^Dyb 



= \\v\\ 



0{5). 



We start by analyzing the measure concentration properties of 
Xy be the Rademacher random variable defined by 

1 



7^*^^^ 



Let 



Xy - 



^*^«'' 



Let /ly denote a median of Xy. By Talagrand pS], we have that for all i > 0, 

(3.2) VI[Xy>^ly+t] < exp{-C2t^/al} 

(3.3) Pr[Xy<^^y^t] < exp{~C2t^ /a^} 



for some global C2, where ay = 



tion (13.11) we have tr? — 



\iDy^''^Dy 

„2 



J_(j>r)- 
D. 



By the triangle inequality and Equa- 

<a^ + \\Dl\\. Clearly ||DJ| = 



Dl\ 



I y II 00 < a. Hence, a'^ = 0{a'^). From the fact that E[X? 



21 



\y\\ and using 



Appendix El and (IS ^ - iP^]) We conclude that \\y\\ - 0[ay) < fiy < \\y\\ + 0{ay). 
Hence, again using p.2p - p.3l) and union bounding over the N vectors in Y, we 
conclude that with probability 0.99, uniformly for all y E Y: 

\\y\\-0{S)<^\\^Dyb\\<\\y\\+0{S). 

We now bound the cross term Z — jb'^Dy^^^Dyb {y is now held fixed). By 
disjointness of supp(y) and supp(y), E[Z] = 0. Decompose b into b + b, where 
supp(5) — supp(y) and supp(6) = supp(y). For any fixed b, the function Z is hnear 
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(and hence convex) in b. Also for all possible values b' of 6, E[Z\b = b'] = 0. Hence, 
again by Talagrand, 



(3.4) 
(3.5) 



Pt[Z > ^iy +t] < cxp{-C2iV^g,} 
Pr[Z < ^ll^, -t] < cxp{-C2iV^s,} 



where fi'~ is a median of {Z\b ^ b'), and erg, = \\\{b'yDy^*-^Dy\ 



Clearly, 



< 



1 



OdISIki) = 0(»i) = 0(a) 



7l*''« 



Again using Appendix 1X1 and E[Z\b = b'] = Q gives that |/i^| = 0(a), and again we 
conclude using a union bound that with probability at least 0.99, uniformly for all 



l;,t n.fT.tiT.n_^l — 0(5). 
Tying it all together, we conclude that with probability at least 0.98, uniformly 



yeY, \^b*Dy<P*^Dyb 



for all y dY, 



^W^Dybr = iW'^DybW 



1 



26*A,«$*$A-,& 



-W^DybW -^.U ^y^-. ^^y. 



WW 



0{6) 



as required. 



D 



4. Conclusions 

The obvious problems left open are those of (1) improving the dependence of 
k in S (from S~^ to S^^) and (2) removing the dependence of k in polylog(n). 
Other directions of research include not only reducing the computational efficiency 
of random dimension reduction, but also the amount of randomness needed for the 
construction. 
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Appendix A. 
Fact A.l. For any real valued random variable Z such that for alH > 
(A.l) Pr[Z >/i + t] < exp{-ctVcr^} 

Pr[Z < fi-t] < cxp{-ctVcr^} 



we have that ^/E{Z^) - 0{<t) < ft < y^E{Z^) + 0{a). 
Proof. Define the variable Z' — {Z — IJ.)/(J. 

oo 

E[Z']<E[\Z'\] < ^iPr(i- 1 < |Z'| < i) 

i=l 

oo oo 

< Y^ i Pr(|Z'| >i-l)<2j2i exp{-c(i - l)^} == 0(1) 

i=l i=l 

Clearly, E[Z'] = 0(1) gives E{Z) = /i + 0{a). In the same way we get E[Z''^] = 
0(1). Thus, E[Z^] - 2fiE[Z] + M^ = 0{a^) and E[Z^] = (m ± 0(cr))2 
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